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METHOD AND APPARATUS FOR CONTROL OF RECEIVE DATA 
Background of the Invention 

The invention relates generally to network data 
processing . 

Networking products such as routers require high 
speed components for packet data movement, i.e., collecting 
packet data from incoming network device ports and queuing 
the packet data for transfer to appropriate forwarding 
device ports. They also require high-speed special 
controllers for processing the packet data, that is, parsing 
the data and making forwarding decisions. Because the 
implementation of these high-speed functions usually 
involves the development of ASIC or custom devices, such 
networking products are of limited flexibility. For 
example, each controller is assigned to service network 
packets from for one or more given ports on a permanent 
basis . 

Summary of the Invention 



In one aspect of the invention, receiving data from 
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a network includes issuing a receive request directing the 
transfer of data from one of the plurality of device ports 
to a buffer memory and specifying a thread from among a 
plurality of processing threads to process the data. 

Brief Description of the Drawings 

Other features and advantages of the invention will 
be apparent from the following description taken together 
with the drawings in which: 

FIG. 1 is a block diagram of a communication system 
employing a hardware-based multi-threaded processor; 

FIG. 2 is a block diagram of a microengine employed 
in the hardware -based mult i- threaded processor of FIG. 1; 

FIG. 3 is an illustration of an exemplary thread 
task assignment; 

FIG. 4 is a block diagram of an I/O bus interface 
shown in FIG. 1; 

FIG. 5 is a detailed diagram of a bus interface unit 
employed by the I/O bus interface of FIG. 4; 

FIGS. 6A-6F are illustrations of various bus 
configuration control and status registers (CSRs) ; 
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FIG. 7A is a detailed diagram illustrating the 
interconnection between a plurality of 10/100 Ethernet 
("slow") ports and the bus interface unit; 

FIG. 7B is a detailed diagram illustrating the 
5 interconnection between two Gigabit Ethernet ("fast") ports 

and the bus interface unit; 

FIGS. 8A-8C are illustrations of the formats of the 

% i RCV RDY_CTL, RCV_RDY__HI and RCV_RDY_LO CSR registers, 

i; 2 respectively; 
iji" FIG. 9 is a depiction of the receive threads and 

7 their interaction with the I/O bus interface during a 

?; = receive process; 

FIGS. 10A and 10B are illustrations of the format of 
1% the RCV_REQ FIFO and the RCV_CTL FIFO, respectively; and 

15 FIG. 11 is an illustration of the thread done 

registers . 

Detailed Description 

2 0 Referring to FIG. 1, a communication system 10 

includes a parallel, hardware -based multi-threaded processor 
12. The hardware based multi-threaded processor 12 is 
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coupled to a first peripheral bus (shown as a PCI bus) 14, a 
second peripheral bus referred to as an I/O bus 16 and a 
memory system 18. The system 10 is especially useful for 
tasks that can be broken into parallel subtasks or 
functions. The hardware -based mult i- threaded processor 12 
includes multiple microengines 22, each with multiple 
hardware controlled program threads that can be 
simultaneously active and independently work on a task. In 
the embodiment shown, there are six microengines 22a-22f and 
each of the six microengines is capable of processing four 
program threads, as will be described more fully below. 

The hardware -based multi-threaded processor 12 also 
includes a processor 23 that assists in loading microcode 
control for other resources of the hardware-based multi- 
threaded processor 12 and performs other general purpose 
computer type functions such as handling protocols, 
exceptions, extra support for packet processing where the 
microengines pass the packets off for more detailed 
processing. In one embodiment, the processor 23 is a 
StrongARM (ARM is a trademark of ARM Limited, United 
Kingdom) core based architecture. The processor (or core) 
23 has an operating system through which the processor 23 
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can call functions to operate on the microengines 22a-22f. 
The processor 23 can use any supported operating system, 
preferably real-time operating system. For the core 
processor implemented as a StrongARM architecture, operating 
systems such as MicrosoftNT real-time, VXWorks and :CUS, a 
freeware operating system available over the Internet, can 
be used. 

The six microengines 22a-22f each operate with 
shared resources including the memory system 18, a PCI bus 
interface 24 and an I/O bus interface 28. The PCI bus 
interface provides an interface to the PCI bus 14. The I/O 
bus interface 28 is responsible for controlling and 
interfacing the processor 12 to the I/O bus 16. The memory 
system 18 includes a Synchronous Dynamic Random Access 
Memory (SDRAM) 18a, which is accessed via an SDRAM 
controller 26a, a Static Random Access Memory (SRAM) 18b, 
which is accessed using an SRAM controller 26b, and a 
nonvolatile memory (shown as a FlashROM) 18c that is used 
for boot operations. The SDRAM 16a and SDRAM controller 2 6a 
are typically used for processing large volumes of data, 
e.g., processing of payloads from network packets. The SRAM 
18b and SRAM controller 2 6b are used in a networking 
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implementation for low latency, fast access tasks, e.g., 
accessing look-up tables, memory for the processor 23, and 
so forth. The microengines 22a-22f can execute memory 
reference instructions to either the SDRAM controller 26a or 
the SRAM controller 18b. 

The hardware -based multi-threaded processor 12 
interfaces to network devices such as a media access 
controller device, including a "slow" device 30 (e.g., 
10/l00BaseT Ethernet MAC) and/or a "fast" device 31, such as 
Gigabit Ethernet MAC, ATM device or the like, over the I/O 
Bus 16. In the embodiment shown, the slow device 30 is an 
10/100 BaseT Octal MAC device and thus includes 8 slow ports 
32a-32h, and the fast device is a Dual Gigabit MAC device 
having two' fast ports 33a, 33b. Each of the network devices 
attached to the I/O Bus 16 can include a plurality of ports 
to be serviced by the processor 12. Other devices, such as 
a host computer (not shown) , that may be coupled to the PCI 
bus 14 are also serviced by the processor 12. In general, 
as a network processor, the processor 12 can interface to 
any type of communication device or interface that 
receives/sends large amounts of data. The processor 12 
functioning as a network processor could receive units of 
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packet data from the devices 30, 31 and process those units 
of packet data in a parallel manner, as will be described. 
The unit of packet data could include an entire network 
packet (e.g., Ethernet packet) or a portion of such a 
packet . 

Each of the functional units of the processor 12 are 
coupled to one or more internal buses. The internal buses 
include an internal core bus 34 (labeled U AMBA" ) for 
coupling the processor 23 to the memory controllers 26a, 26b 
and to an AMBA translator 36. The processor 12 also 
includes a private bus 3 8 that couples the microengines 22a- 
22 f to the SRAM controller 2 6b, AMBA translator 3 6 and the 
Fbus interface 28. A memory bus 40 couples the memory 
controllers 26a, 26b to the bus interfaces 24, 28 and the 
memory system 18. 

Referring to FIG. 3, an exemplary one of the 
microengines 22a-22f is shown. The microengine 22a includes 
a control store 70 for storing a microprogram. The 
microprogram is loadable by the central processor 20. The 
microengine 70 also includes control logic 72. The control 
logic 72 includes an instruction decoder 73 and program 
counter units 72a- 72d. The four program counters are 
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maintained in hardware. The microengine 22a also includes 
context event switching logic 74 . The context event 
switching logic 74 receives messages (e.g., 
SEQ_#_EVENT_RESPONSE ; FB I_EVENT_JRE S PONS E ; 
SRAM_EVENT_RESPONSE; SDRAM_EVENT_RESPONSE ; and 
AMBA_EVENT___RESPONSE) from each one of the share resources, 
e.g., SRAM 26b, SDRAM 26a, or processor core 20, control and 
status registers, and so forth. These messages provides 
information on whether a requested function has completed. 
Based on whether or not the function requested by a thread 
has completed and signaled completion, the thread needs to 
wait for that complete signal, and if the thread is enabled 
to operate, then the thread is place on an available thread 
list (not shown) . As earlier mentioned, the microengine 22a 
can have a maximum of 4 threads of execution available. 

In addition to event signals that are local to an 
executing thread, the microengine employs signaling states 
that are global. With signaling states, an executing thread 
can broadcast a signal state to all microengines 22. Any 
and all threads in the microengines can branch on these 
signaling states. These signaling states can be used to 
determine availability of a resource or whether a resource 

8 
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is due for servicing. 

The context event logic 74 has arbitration for the 
four threads. In one embodiment, the arbitration is a round 
robin mechanism. However, other arbitration techniques, 
such as priority queuing or weighted fair queuing, could be 
used. The microengine 22a also includes and execution box 
(EBOX) data path 76 that includes an arithmetic logic unit 
(ALU) 76a and a general purpose register (GPR) set 76b. The 
ALU 76a performs arithmetic and logical functions as well as 
shift functions. 

The microengine 22a further includes a write 
transfer register file 78 and a read transfer register file 
80. The write transfer register file 78 stores data to be 
written to a resource. The read transfer register file 80 
is for storing return data from a resource. Subsequent to 
or concurrent with the data arrival, an event signal from 
the respective shared resource, e.g., memory controllers 
26a, 26b, or core 23, will be provided to the context event 
arbiter 74, which in turn alerts the thread that the data is 
available or has been sent. Both transfer register files 
78, 80 are connected to the EBOX 76 through a data path. In 
the described implementation, each of the register files 
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includes 64 registers. 

The functionality of the microengine threads is 
determined by microcode loaded (via the core processor) for 
a particular user's application into each microengine' s 
control store 70. Referring to FIG. 3, an exemplary thread 
task assignment 90 is shown. Typically, one of the 
microengine threads is assigned to serve as a receive 
scheduler 92 and another as a transmit scheduler 94 . A 
plurality of threads are configured as receive processing 
threads 96 and transmit processing (or "fill") threads 98. 
Other thread task assignments include a transmit arbiter 100 
and one or more core communication threads 102. Once 
launched, a thread performs its function independently. 

The receive scheduler thread 92 assigns packets to 
receive processing threads 96. In a packet forwarding 
application for a bridge/router, for example, the receive 
processing thread parses packet headers and performs lookups 
based in the packet header information. Once the receive 
processing thread or threads 96 has processed the packet, it 
either sends the packet as an exception to be further 
processed by the core 23 (e.g., the forwarding information 
cannot be located in lookup and the core processor must 

10 
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learn it) , or stores the packet in the SDRAM and queues the 
packet in a transmit queue by placing a packet link 
descriptor for it in a transmit queue associated with the 
transmit (forwarding port) indicated by the header/lookup. 
The transmit queue is stored in the SRAM, The transmit 
arbiter thread 100 prioritizes the transmit queues and the 
transmit scheduler thread 94 assigns packets to transmit 
processing threads that send the packet out onto the 
forwarding port indicated by the header /lookup information 
during the receive processing. 

The receive processing threads 96 may be dedicated 
to servicing particular ports or may be assigned to ports 
dynamically by the receive scheduler thread 92. For certain 
system configurations, a dedicated assignment may be 
desirable. For example, if the number of ports is equal to 
the number of receive processing threads 96, then it may be 
quite practical as well as efficient to assign the receive 
processing threads to ports in a one-to-one, dedicated 
assignment. In other system configurations, a dynamic 
assignment may provide a more efficient use of system 
resources . 

The receive scheduler thread 92 maintains scheduling 

11 
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information 104 in the GPRs 76b of the microengine within 
which it executes. The scheduling information 104 includes 
thread capabilities information 106, port- to- thread 
assignments (list) 108 and "thread busy" tracking 
information 110. At minimum, the thread capabilities 
information informs the receive scheduler thread as to the 
type of tasks for which the other threads are configured, 
e.g., which threads serve as receive processing threads. 
Additionally, it may inform the receive scheduler of other 
capabilities that may be appropriate to the servicing of a 
particular port. For instance, a receive processing thread 
may be configured to support a certain protocol, or a 
particular port or ports. A current list of the ports to 
which active receive processing threads have been assigned 
by the receive scheduler thread is maintained in the thread- 
to-port assignments list 108. The thread busy mask register 
110 indicates which threads are actively servicing a port. 
The receive scheduler uses all of this scheduling 
information in selecting threads to be assigned to ports 
that require service for available packet data, as will be 
described in further detail below. 

Referring to FIG. 4, the I/O bus interface 28 

12 



ATTORNEY DOCKET NO: 10559/137001/P7876 

includes shared resources 120, which are coupled to a 
push/pull engine interface 122 and a bus interface unit 124 . 
The bus interface unit 124 includes a ready bus controller 
126 connected to a ready bus 128 and an Fbus controller 13 0 
for connecting to a portion of the I/O bus referred to as an 
Fbus 132. Collectively, the ready bus 128 and the Fbus 132 
make up the signals of the I/O bus 16 (FIG. 1) . The 
resources 12 0 include two FIFOs, a transmit FIFO 134 and a 
receive FIFO 136, as well as GSRs 138, a scratchpad memory 
140 and a hash unit 142. The Fbus 132 transfers data 
between the ports of the devices 30, 31 and the I/O bus 
interface 28. The ready bus 128 is an 8-bit bus that 
performs several functions. It is used to read control 
information about data availability from the devices 30, 31, 
e.g., in the form of ready status flags. It also provides 
flow control information to the devices 30, 31, and may be 
used to communicate with another network processor 12 that 
is connected to the Fbus 132. Both buses 128, 132 are 
accessed by the microengines 22 through the CSRs 138. The 
CSRs 13 8 are used for bus configuration, for accessing the 
bus interface unit 124, and for inter- thread signaling. 
They also include a several counters and thread status 

13 
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registers, as will be described. The CSRs 138 are accessed 
by the microengines 22 and the core 23. The receive FIFO 
(RFIFO) 136 includes data buffers for holding data received 
from the Fbus 132 and is read by the microengines 22. The 
transmit FIFO (TFIFO) 134 includes data buffers that hold 
data to be transmitted to the Fbus 132 and is written by the 
microengines 22. The scatchpad memory 140 is accessed by 
the core 23 and microengines 22 , and supports a variety of 
operations, including read and write operations, as well as 
bit test, bit test/clear and increment operations. The hash 
unit 142 generates hash indexes for 48 -bit or 64 -bit data 
and is accessed by the microengines 22 during lookup 
operations . 

The processors 23 and 22 issue commands to the 
push/pull engine interface 122 when accessing one of the 
resources 120. The push/pull engine interface 122 places 
the commands into queues (not shown) , arbitrates which 
commands to service, and moves data between the resources 
12 0, the core 23 and the microengines 22. In addition to 
servicing requests from the core 23 and microengines 22, the, 
push/pull engines 122 also service requests from the ready 
bus 128 to transfer control information to a register in the 

14 
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microengine read transfer registers 80. 

When a thread issues a request to a resource 12 0, a 
command is driven onto an internal command bus 150 and 
placed in queues within the push/pull engine interface 122. 
Receive/read-related instructions (such as instructions for 
reading the CSRS) are written to a "push" command queue. 

The CSRs 13 8 include the following types of 
registers: Fbus receive and transmit registers; Fbus and 
ready bus configuration registers; ready bus control 
registers; hash unit configuration registers; interrupt 
registers; and several miscellaneous registers, including a 
thread status registers. Those of the registers which 
pertain to the receive process will be described in further 
detail . 

The interrupt/signal registers include an 
I NTER__THD_S I G register for inter- thread signaling. Any 
thread within the microengines 22 or the core 23 can write a 
thread number to this register to signal an inter-thread 
event . 

Further details of the Fbus controller 130 and the 
ready bus controller 126 are shown in FIG. 5. The ready bus 
controller 12 6 includes a programmable sequencer 160 for 

15 
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retrieving MAC device status information from the MAC 
devices 30, 31, and asserting flow control to the MAC 
devices over the ready bus 128 via ready bus interface logic 

161. The Fbus controller 130 includes Fbus interface logic 

162, which is used to transfer data to and from the devices 
30, 31, is controlled by a transmit state machine (TSM) 164 
and a receive state machine (RSM) 166. In the embodiment 
herein, the Fbus 132 may be configured as a bidirectional 
64 -bit bus, or two dedicated 32 -bit buses. In the 
unidirectional, 32-bit configuration, each of the state 
machines owns its own 3 2 -bit bus. In the bidirectional 
configuration, the ownership of the bus is established 
through arbitration. Accordingly, the Fbus controller 13 0 
further includes a bus arbiter 16 8 for selecting which state 
machine owns the Fbus 132. 

Some of the relevant CSRs used to program and 
control the ready bus 128 and Fbus 132 for receive processes 
are shown in FIGS. 6A-6F. Referring to FIG. 6A, 
RDYBUS_TEMPLATE_PROGx registers 170 are used to store 
instructions for the ready bus sequencer. Each register of 
these 32-bit registers 170a, 170b, 170c, includes four, 8- 
bit instruction fields 172. Referring to FIG. 6B, a 

16 
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RCV_RDY___CTL register 174 specifies the behavior of the 
receive state machine 166. The format is as follows: a 
reserved field (bits 31:15) 174a; a fast port mode field 

(bits 14:13) 174b, which specifies the fast (Gigabit) port 
thread mode, as will be described; an auto push prevent 
window field (bits 12:10) 174c for specifying the autopush 
prevent window used by the ready bus sequencer to prevent 
the receive scheduler from accessing its read transfer 
registers when an autopush operation (which pushes 
information to those registers) is about to begin; an 
autopush enable (bit 9) 174d, used to enable autopush of the 
receive ready flags; another reserved field (bit 8) 174e; an 
autopush destination field (bits 7:6) 174f for specifying an 
autopush operation's destination register; a signal thread 
enable field (bit 5) 174g which, when set, indicates the 
thread to be signaled after an autopush operation; and a 
receive scheduler thread ID (bits 4:0) 174h, which specifies 
the ID of the microengine thread that has been configured as 
a receive scheduler. 

Referring to FIG. 6C, a REC__FASTPORT_CTL register 
176 is relevant to receiving packet data from fast ports 

(fast port mode) only. It enables receive threads to view 

17 
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the current assignment of header and body thread assignments 
for the two fast ports, as will be described. It includes 
the following fields: a reserved field (bits 31:20) 176a; 
an F P 2 _HDR_THD_ I D field (bits 19:15) 176b, which specifies 
the fast port 2 header receive (processing) thread ID; an 
FP2_BODY_THD_ID field (bits 14:10) 176c for specifying the 
fast port 2 body receive processing thread ID; an 
F P 1_HDR_THD_ I D field (bits 9:5) 176d for specifying the fast 
port 1 header receive processing thread ID; and an 
FP1_B0DY_THD_ID field (bits 4:0) 176e for specifying the 
fast port 1 body processing thread ID. The manner in which 
these fields are used by the RSM 166 will be described in 
detail later . 

Although not depicted in detail, other bus registers 
include the following: a RDYBUS_TEMPLATE_CTL register 178 
(FIG. 6D) , which maintains the control information for the 
ready bus and the Fbus controllers, for example, it enables 
the ready bus sequencer; a RDYBUS_SYNCH_COUNT_DEFAULT 
register 180 (FIG. 6E) , which specifies the program cycle 
rate of the ready bus sequencer; and an FP_FASTPORT_CTL 
register 182 (FIG. 6F) , which specifies how many Fbus clock 
cycles the RSM 166 must wait between the last data transfer 

18 
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and the next sampling of fast receive status, as will be 
described. 

Referring to FIG. 7A, the MAC device 3 0 provides 
transmit status flags 200 and receive status flags 202 that 
indicate whether the amount of data in an associated 
transmit FIFO 204 or receive FIFO 206 has reached a certain 
threshold level. The ready bus sequencer 160 periodically 
polls the ready flags (after selecting either the receive 
ready flags 202 or the transmit ready flags 200 via a flag 
select 208) and places them into appropriate ones of the 
CSRs 13 8 by transferring the flag data over ready bus data 
lines 209. In this embodiment, the ready bus includes 8 
data lines for transferring flag data from each port to the 
Fbus interface unit 124. The CSRs in which the flag data 
are written are defined as RCVJRDY_HI/LO registers 210 for 
receive ready flags and XMIT_RDY_Hl/LO registers 212 for 
transmit ready flags, if the ready bus sequencer 160 is 
programmed to execute receive and transmit ready flag read 
instructions , respectively . 

When the ready bus sequencer is programmed with an 
appropriate instruction directing it to interrogate MAC 
receive ready flags, it reads the receive ready flags from 

19 
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the MAC device or devices specified in the instruction and 
places the flags into RCV_RDY_HI register 210a and a 
RCV_RDY_LO register 210b, collectively, RCV_RDY registers 
210. Each bit in these registers corresponds to a different 
device port on the I/O bus. 

Also, and as shown in FIG. 7B, the bus interface 
unit 124 also supports two fast port receive ready flag pins 
FAST_RX1 214a and FAST_RX2 214b for the two fast ports of 
the fast MAC device 31. These fast port receive ready flag 
pins are read by the RSM 166 directly and placed into an 
RCV_RDY_CNT register 216. 

The RCV_RDY_CNT register 216 is one of several used 
by the receive scheduler to determine how to issue a receive 
request. It also indicates whether a flow control request 
is issued. 

Referring to FIG. 8A, the format of the RCV_RDY__CNT 
register 216 is as follows: bits 31:28 are defined as a 
reserved field 216a; bit 27 is defined as a ready bus master 
field 216b and is used to indicate whether the ready bus 128 
is configured as a master or slave; a field corresponding to 
bit 26 216c provides flow control information; bits 25 and 
24 correspond to FRDY2 field 216d and FRDY1 field 216e, 

20 
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respectively. The FRDY2 216d and FRDY1 216e are used to 
store the values of the FAST_RX2 pin 214b and FAST_RX1 pin 
214a, respectively, both of which are sampled by the RSM 166 
each Fbus clock cycle; bits 23:16 correspond to a reserved 
field 216f; a receive request count field (bits 15:8) 216g 
specifies a receive request count, which is incremented 
after the RSM 166 completes a receive request and data is 
available in the RFIFO 13 6/ a receive ready count field 
(bits 7:0) 216h specifies a receive ready count, an 8-bit 
counter that is incremented each time the ready bus 
sequencer 160 writes the ready bus registers RCV_RDY_CNT 
register 216, the RCV_RDY_LO register 210b and RCV_RDY_HI 
register 210a to the receive scheduler read transfer 
registers . 

There are two techniques for reading the ready bus 
registers: "autopush" and polling. The autopush instruction 
may be executed by the ready bus sequencer 16 0 during a 
receive process (rxautopush) or a transmit process 
(txautopush) . Polling requires that a microengine thread 
periodically issue read references to the I/O bus interface 
28 . 

The rxautopush operation performs several functions. 
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It increments the receive ready count in the RCV_RDY_CNT 
register 216. If enabled by the RCV_RDY_CTL register 174, 
it automatically writes the RCV_RDY_CNT 216, the RCV_RDYJLO 
and RCV_RDY_HI registers 210b, 210a to the receive scheduler 
read transfer registers and signals to the receive scheduler 
thread 92 (via a context event signal) when the rxautopush 
operation is complete. 

The ready bus sequencer 160 polls the MAC FIFO 
status flags periodically and asynchronously to other events 
occurring in the processor 12. Ideally, the rate at which 
the MAC FIFO ready flags are polled is greater than the 
maximum rate at which the data is arriving at the MAC ports. 
Thus, it is necessary for the receive scheduler thread 92 to 
determine whether the MAC FIFO ready flags read by the ready 
bus sequencer 16 0 are new, or whether they have been read 
already. The rxautopush instruction increments the receive 
ready count in the RCV_RDY_CNT register 216 each time the 
instruction executes. The RCV_RDY_CNT register 216 can be 
used by the receive scheduler thread 92 to determine whether 
the state of specific flags have to be evaluated or whether 
they can be ignored because receive requests have been 
issued and the port is currently being serviced. For 
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example, if the FIFO threshold for a Gigabit Ethernet port 
is set so that the receive ready flags are asserted when 64 
bytes of data are in the MAC receive FIFO 2 06, then the 
state of the flags does not change until the next 64 bytes 
arrive 5120 ns later. If the ready bus sequencer 160 is 
programmed to collect the flags four times each 5120 ns 
period, the next three sets of ready flags that are to be 
collected by the ready bus sequence 160 can be ignored. 

When the receive ready count is used to monitor the 
freshness of the receive ready flags, there is a possibility 
that the receive ready flags will be ignored when they are 
providing new status. For a more accurate determination of 
ready flag freshness, the receive request count may be used. 
Each time a receive request is completed and the receive 
control information is pushed onto the RCV_CNTL register 
232, the the RSM 166 increments the receive request count. 
The count is recorded in the RCV_RDY_CNT register the first 
time the ready bus sequencer executes an rxrdy instruction 
for each program loop. The receive scheduler thread 92 can 
use this count to track how many requests the receive state 
machine has completed. As the receive scheduler thread 
issues commands, it can maintain a list of the receive 
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requests it submits and the ports associated with each such 
request . 

Referring to FIGS. 8B and 8C, the registers 
RCV_RDY_HI 210a and RCV_RDY_LO 210b have a flag bit 217a, 
217b, respectively, corresponding to each port. 

Referring to FIG. 9, the receive scheduler thread 92 
performs its tasks as quickly as possible to ensure that the 
RSM 166 is always busy, that is, that there is always a 
receive request waiting to be processed by the RSM 166. 
Several tasks performed by the receive scheduler 92 are as 
follows. The receive scheduler 92 determines which ports 
need to be serviced by reading the RCV_RDY_HI, RCVJRDY_LO 
and RCV_RDY_CNT registers 210a, 210b and 216, respectively. 
The receive scheduler 92 also determines which receive ready 
flags are new and which are old using either the receive 
request count or the receive ready count in the RCV_RDY_CNT 
register, as described above. It tracks the thread 
processing status of the other microengine threads by 
reading thread done status CSRs 24 0. The receive scheduler 
thread 92 initiates transfers across the Fbus 132 via the 
ready bus, while the receive state machine 16 6 performs the 
actual read transfer on the Fbus 132. The receive scheduler 
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92 interfaces to the receive state machine 166 through two 
FBI CSRs 138: an RCV_REQ register 230 and an RCV_CNTL 
register 232. The RCV_REQ register 230 instructs the 
receive state machine on how to receive data from the Fbus 
132 . 

Still referring to FIG. 9, a process of initiating 
an Fbus receive transfer is shown. Having received ready 
status information from the RCV_RDY_HI/LO registers 210a, 
210b as well as thread availability from the thread done 
register 240 (transaction w l" , as indicated by the arrow 
labeled 1) , the receive scheduler thread 92 determines if 
there is room in the RCV_REQ FIFO 23 0 for another receive 
request. If it determines that RCV_REQ FIFO 230 has room to 
receive a request, the receive scheduler thread 92 writes a 
receive request by pushing data into the RCV_REQ FIFO 23 0 
(transaction 2) . The RSM 166 processes the request in the 
RCV_REQ FIFO 23 0 (transaction 3) . The RSM 166 responds to 
the request by moving the requested data into the RFIFO 13 6 
(transaction 4) , writing associated control information to 
the RCV_CTL FIFO 232 (transaction 5) and generating a 
start_receive signal event to the receive processing thread 
96 specified in the receive request (transaction 6) . The 
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RFIFO 136 includes 16 elements 241, each element for storing 
a 64 byte segment of data referred to herein as a MAC packet 
( W MPKT") . The RSM 166 reads packets from the MAC ports in 
fragments equal in size to one or two RFIFO elements, that 
is, MPKTs. The specified receive processing thread 96 
responds to the signal event by reading the control 
information from the RCV_CTL register 232 (transaction 7) . 
It uses the control information to determine, among other 
pieces of information, where the data is located in the 
RFIFO 136. The receive processing thread 96 reads the data 
from the RFIFO 13 6 on quadword boundaries into its read 
transfer registers or moves the data directly into the SDRAM 
(transaction 8) . 

The RCV_REQ register 23 0 is used to initiate a 
receive transfer on the Fbus and is mapped to a two- entry 
FIFO that is written by the microengines . The I/O bus 
interface provides signals (not shown) to the receive 
scheduler thread indicating that the RCV_REQ FIFO 236 has 
room available for another receive request and that the last 
issued receive request has been stored in the RCV_REQ 
register 230. 

Referring to FIG. 10A, the RCV_REQ FIFO 230 includes 
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two entries 231. The format of each entry 231 is as 
follows. The first two bits correspond to a reserved field 
230a. Bit 29 is an FA field 230b for specifying the maximum 
number of Fbus accesses to be performed for this request. A 
THSG field (bits 28:27) 230c is a two-bit thread message 
field that allows the scheduler thread to pass a message to 
the assigned receive thread through the ready state machine, 
which copies this message to the RCV_CNTL register. An SL 
field 230d (bit 26) is used in cases where status 
information is transferred following the EOP MPKT. It 
indicates whether two or one 32 -bit bus accesses are 
required in a 32-bit Fbus configuration. An El field 230e 
(bits 21:18) and an E2 field (bits 25:22) 230f specify the 
RFIFO element to receive the transferred data. If only 1 
MPKT is received, it is placed in the element indicated by 
the El field. If two MPKTs are received, then the second 
MPKT is placed in the RFIFO element indicated by the E2 
field. An FS field (bits 17:16) 230g specifies use of a 
fast or slow port mode, that is, whether the request is 
directed to a fast or slow port. The fast port mode setting 
signifies to the RSM that a sequence number is to be 
associated with the request and that it will be handling 
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speculative requests, which will be discussed in further 
detail later. An NFE field (bit 15) 230h specifies the 
number of RFIFO elements to be filled (i.e., one or two 
elements) . The IGFR field (bit 13) 230i is used only if 
fast port mode is selected and indicates to the RSM that it 
should process the request regardless of the status of the 
fast ready flag pins. An SIGRS field (bit 11) 230j, if set, 
indicates that the receive scheduler be signaled upon 
completion of the receive request. A TID field (bits 10:6) 
230k specifies the receive thread to be notified or signaled 
after the receive request is processed. Therefore, if bit 
11 is set, the RCV_REQ entry must be read twice, once by the 
receive thread and once by the receive scheduler thread, 
before it can be removed from the RCV__REQ FIFO. An RM field 
(bits 5:3) 2301 specified the ID of the MAC device that has 
been selected by the receive scheduler. Lastly, an RP field 
(bits 2:0) 230m specifies which port of the MAC device 
specified in the RM field 2301 has been selected. 

The RSM 166 reads the RCVJREQ register entry 231 to 
determine how it should receive data from the Fbus 132, that 
is, how the signaling should be performed on the Fbus, where 
the data should be placed in the RFIFO and which microengine 
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thread should be signaled once the data is received. The 
RSM 166 looks for a valid receive request in the RCV_REQ 
FIFO 230. It selects the MAC device identified in the RM 
field and selects the specified port within the MAC by 
asserting the appropriate control signals. It then begins 
receiving data from the MAC device on the Fbus data lines. 
The receive state machine always attempts to read either 
eight or nine quadwords of data from the MAC device on the 
Fbus as specified in the receive request. If the MAC device 
asserts the EOP signal, the RSM 166 terminates the receive 
early (before eight or nine accesses are made) . The RSM 166 
calculates the total bytes received for each receive request 
and reports the value in the REC_CNTL register 232. If EOP 
is received, the RSM 166 determines the number of valid 
bytes in the last received data cycle. 

The RCV_CNTL register 232 is mapped to a four-entry 
FIFO (referred to herein as RCV_CNTL_FIFO 232) that is 
written by the receive state machine and read by the 
microengine thread. The I/O bus interface 28 signals the 
assigned thread when a valid entry reaches the top of the 
RCV CNTL FIFO. When a microengine thread reads the RCV_CNTL 
register, the data is popped off the FIFO. If the SIGRS 
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field 230i is set in the RCV_REQ register 230, the receive 
scheduler thread 92 specified in the RCV_CNTL register 232 
is signaled in addition to the thread specified in TID field 
230k. In this case, the data in the RCV_CNTL register 232 
is read twice before the receive request data is retired 
from the RCV_CTL FIFO 232 and the next thread is signaled. 
The receive state machine writes to the RCV_CTL register 232 
as long as the FIFO is not full. If the RCV_CTL FIFO 232 is 
full, the receive state machine stalls and stops accepting 
any more receive requests. 

Referring to FIG. 10B, the RCV_CNTL FIFO 232 
provides instruction to the signaled thread (i.e., the 
thread specified in TID) to process the data. As indicated 
above, the RCV_CNTL FIFO includes 4 entries 233. The format 
of the RCV_CNTL FIFO entry 233 is as follows: a THMSG field 
(31:30) 23a includes the 2-bit message copied by the RSM 
from REC_REQ register [28 : 27] . A MACPORT/THD field (bits 
29:24) 232b specifies either the MAC port number or a 
receive thread ID, as will be described in further detail 
below. An SOP SEQ field (23:20) 232c is used for fast ports 
and indicates a packet sequence number as an SOP ( start -of - 
packet) sequence number if the SOP was asserted during the 
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receive data transfer and indicates an MPKT sequence number 
if SOP was not so asserted. An RF field 232d and RERR field 
232e (bits 19 and 18, respectively) both convey receive 
error information. An SE field 232f (17:14) and an FE field 
232g (13:10) are copies of the E2 and El fields, 
respectively, of the REC_REQ. An EF field (bit 9) 232h 
specifies the number of RFIFO elements which were filled by 
the receive request. An SN field (bit 8) 232i is used for 
fast ports and indicates whether the sequence number 
specified in SOP_SEQ field 232c is associated with fast port 
1 or fast port 2. A VLD BYTES field (7:2) 232 j specifies 
the number of valid bytes in the RFIFO element if the 
element contains in EOP MPKT. An EOP field (bit 1) 232k 
indicates that the MPKT is an EOP MPKT. An SOP field (bit 
0) 2321 indicates that the MPKT is an SOP MPKT. 

FIG. 11 illustrates the format of the thread done 
registers 240 and their interaction with the receive 
scheduler and processing threads 92, 96, respectively, of 
the microengines 22. The thread done registers 240 include 
a first thread status register, TH_DONE_REG0 24 0a, which has 
2 -bit status fields 241a corresponding to each of threads 0 
through 15. A second thread status register, TH__D0NE_REG1 
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240b, has 2-bit status fields 241b corresponding to each of 
threads 16 through 23. These registers can be read and 
written to by the threads using a CSR instruction (or fast 
write instruction, described below) . The receive scheduler 
thread can use these registers to determine which RFIFO 
elements are not in use. Since it is the receive scheduler 
thread 92 that assigns receive processing threads 96 to 
process the data in the RFIFO elements, and it also knows 
the thread processing status from the THREAD_DONE__REG0 and 
THRE AD_DONE_REG 1 registers 240a, 240b, it can determine 
which RFIFO elements are currently available. 

The THREAD_DONE CSRs 24 0 support a two-bit message 
for each microengine thread. The assigned receive thread 
may write a two-bit message to this register to indicate 
that it has completed its task. Each time a message is 
written to the THREAD_DONE register, the current message is 
logically ORed with the new message. The bit values in the 
THREAD_DONE registers are cleared by writing a "1", so the 
scheduler may clear the messages by writing the data read 
back to the THREAD_DONE register. The definition of the 2- 
bit status field is determined in software. An example of 
four message types is illustrated in TABLE 1 below. 
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2-BIT 
MESSAGE 


DEFINITION 


00 


Busy. 


01 


Idle, processing complete. 


10 


Not busy, but waiting to finish 
processing of entire packet. 


11 


Idle, processing complete for an EOP 
MPKT. 



TABLE 1 

The assigned receive processing threads write their status 
to the THREAD_DONE register whenever the status changes. 
For example, a thread may immediately write 00 to the 
THREADJDONE register after the receive state machine signals 
the assigned thread. When the receive scheduler thread 
reads the THREAD_DONE register, it can look at the returned 
value to determine the status of each thread and then update 
its thread/port assignment list. 

The microengine supports a fast_wr instruction that 
improves performance when writing to a subset of CSR 
registers. The fast_wr instruction does not use the push or 
pull engines. Rather, it uses logic that services the 
instruction as soon as the write request is issued to the 
FBI CSR. The instruction thus eliminates the need for the 
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pull engine to read data from a microengine transfer 
register when it processes the command. The meaning of the 
10 -bit immediate data for some of the CSRs is shown below. 



CSR 


10 -BIT IMMEDIATE DATA 


INTER_THDj3IG 


Thread number of the thread that 
is to be signaled. 


THREAD_DONE 


A 2 -bit message that is shifted 
into a position relative to the 
thread that is writing the 
message . 


THREAD DONE INCR1 
THREAD_DONE_INCR2 


Same as THREAD_DONE except that 
either the enqueue_seql or 
enqueue_seq2 is also incremented. 


INCR ENQ NUM1 
INCR__ENQ_NUM2 


Write a one to increment the 
enqueue sequence number by one. 



TABLE 2 

It will be appreciated that the receive process as 
described herein assumes that no packet exemptions occurred, 
that is, that the threads are able to handle the packet 
processing without assistance from the core processor. 
Further, the receive process as described also assumes the 
availability of FIFO space. It will be appreciated that the 
various state machines must determine if there is room 
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available in a FIFO, e.g., the RFIFO, prior to writing new 
entries to that FIFO. If a particular FIFO is full, the 
state machine will wait until the appropriate number of 
entries has been retired from that FIFO. 

Additions, subtractions, and other modifications of 
the preferred embodiments of the invention will be apparent 
to those practiced in this field and are within the scope of 
the following claims. 
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What is claimed is: 



1 1. A method of receiving data from a network, 

2 comprising : 

3 issuing a receive request directing a transfer of data 

4 from one of a plurality of device ports to a buffer memory and 

5 specifying a thread from among a plurality of processing program 
63 threads to process the data. 

P 2. The method of claim 1, further comprising: 

2=* determining if any of the plurality of device ports 

coupled to the network require service . 

If 3. The method of claim 2, further comprising: 

2 transferring the data to the buffer memory and signaling 

3 to the specified program thread that the data is ready for 

4 processing. 

1 4. The method of claim 2, wherein determining 

2 comprises : 

3 interrogating the plurality of device ports to identify 

4 which of the plurality of device ports require service. 
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1 5. The method of claim 4, wherein determining further 

2 comprises : 

3 preparing control information corresponding to those 

4 device ports identified as requiring service. 

1 6. The method of claim 5, wherein the control 

23 information comprises receive ready flags each associated with a 

34 device port receive FIFO in a corresponding one of the device 

|fl ports. 

i* 7. The method of claim 6, wherein interrogating 

|y comprises: 

33 polling the state of the ready flags to determine if the 

4 ready flags are asserted, the assertion of the ready flags 

5 indicating that the corresponding device ports have data ready 
1 for transfer. 

1 8. The method of claim 7, wherein the receive ready 

2 flags indicate that the associated device port receive FIFO has 

3 reached a threshold level of fullness. 
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1 9. The method of claim 8, wherein the receive ready 

2 flags indicate that the associated device port receive FIFO 

3 stores a full network packet. 

1 10. The method of claim 5, further comprising: 

2 maintaining a receive ready count, the receive ready 

3 count being incremented when the control information is prepared. 

T: 11. The method of claim 5 , wherein preparing control 

21 information further comprises: 

3U writing a flag to a control and status register for each 

| ; . device port in the plurality of device ports that is determined 
to require service. 

37 12. The method of claim 11, wherein issuing comprises: 

2 - obtaining the control information from the control and 

3 status register; and 

4 selecting from each device port in the plurality of 

5 device ports having set bits in the control and status register a 

6 port for servicing. 

1 13. The method of claim 12, wherein issuing further 
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2 comprises: 

3 determining which among the plurality of program threads 

4 is available; and 

5 assigning an available program thread to the selected 

6 port . 

1 14. The method of claim 12, wherein selecting a port 

2;j comprises: 

Bj using the receive ready count to determine if the ready 

|n flags reflect current status of the device port. 

fU 15. The method of claim 3, further comprising: 

|y maintaining a receive request count for counting transfer 

Wj of data to the buffer memory, the receive request count being 

4 incremented by one upon the transfer of the data to the buffer 

5 memory and signaling to the specified program thread. 

1 16. The method of claim 15, wherein selecting a port 

2 further comprises: 

3 using the receive request count to determine if the ready 

4 flags reflect current status of the device ports. 
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1 17. A method of receiving data from a plurality of 

2 peripheral ports, comprising: 

3 determining that the one of the plurality of peripheral 

4 ports requires servicing; 

5 issuing a receive request based on the determination, the 

6 receive request directing the transfer of data from the one of 

7 the plurality of peripheral ports to a buffer memory and 
83 specifying a program thread from among of a plurality of 
§r processing program threads to process the data; and 

lj)fj transferring the data to the buffer memory and signaling 

ljjU to the specified thread that the data is ready for processing. 

||j 18. An article comprising a computer- readable medium 

83 which stores computer-executable instructions for receiving data 

B3 from a plurality of ports, the instructions causing a computer 

4 to: 

5 issue a receive request directing a transfer of data from 

6 one of a plurality of device ports to a buffer memory and 

7 specifying a program thread from among a plurality of processing 

8 program threads to process the data. 

1 19. The article of claim 18, the article further 
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2 comprises instructions causing a computer to: 

3 determine if any of the plurality of device ports coupled 

4 to the network require service. 

1 20. The article of claim 19, the article further 

2 comprises instructions causing a computer to: 

3 transfer the data to the buffer memory and signal to the 
§3 specified program thread that the data is ready for processing. 

|K1 21. The article of claim 19 , wherein the instructions to 

p determine comprise instructions causing a computer to: 

8 interrogate the plurality of device ports to identify 

iJ which of the plurality of device ports require service; and 

& prepare control information corresponding to those device 

CI ports identified as requiring service. 

1 22. The article of claim 21, the article further 

2 comprises instructions causing a computer to: 

3 maintain a receive ready count, the receive ready count 

4 being incremented when the control information is prepared. 

1 23. The article of claim 22, wherein the instructions to 
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2 issue comprise instructions causing a computer to: 

3 use the receive ready count to check the current status 

4 of the device port . 

1 24. The article of 19, the article further comprises 

2 instructions causing a computer to: 

3 maintain a receive request count for counting transfer of 

4 data to the buffer memory, the receive request count being 

:? ~'*r. 
% -J 

incremented by one upon the transfer of the data to the buffer 
6 J memory and signaling to the specified program thread. 

i;: 25. The article of claim 24 , wherein the instructions to 

2* issue comprise instructions causing a computer to: 
M use the receive request count to check the current status 

43 of the device ports. 

1 2 6. A network processor comprising: 

2 a microengine for executing threads, the threads 

3 including a receive scheduler program thread and receive 

4 processing program threads; and 

5 the receive scheduler thread assigning a port to one of 

6 the receive processing program threads if the port has available 
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7 data. 



1 27. The network processor, further wherein the receive 

2 scheduler program thread directs transfer of the data 

3 to the assigned one of the receive processing program threads for 

4 processing. 



1 28. The network processor, further comprising: 

23 an interface coupled to the microengine for receiving 

3J data from the port, the interface for indicating to the receive 
f] scheduler program thread whether the port has data available for 
processing by one of the receive processing program threads. 
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Abstract of the Disclosure 

A network processor that has multiple processing 
elements, each supporting multiple simultaneous program threads 
with access to shared resources in an interface. Control logic 
in the interface samples the ready state of network ports and 
forwards the ready state information to a scheduler -program 
thread. The scheduler program thread issues receive request 
command that direct the interface to fetch segments of data from 
a selected one of the network ports into a receive FIFO for 
processing by an assigned one of a plurality of receive 
processing program threads. For each segment of data that is to 
be transferred to the receive FIFO, a control FIFO is loaded with 
control information specifying the associated receive FIFO 
location (s), the selected one of the network ports and the 
assigned receive processing program thread. The request is 
processed by reading the control information, transferring the 
data indicated by the request to the receive FIFO and loading 
another control FIFO with control information specifying how the 
data is to be processed by the assigned receive processing 
program thread. The interface signals the assigned receive 
processing program thread to process the data. 
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